Goto

Collaborating Authors

 hierarchical reinforcement learning


Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Neural Information Processing Systems

Hierarchical Reinforcement Learning (HRL) is a promising approach to solving long-horizon problems with sparse and delayed rewards. Many existing HRL algorithms either use pre-trained low-level skills that are unadaptable, or require domain-specific information to define low-level rewards. In this paper, we aim to adapt low-level skills to downstream tasks while maintaining the generality of reward design. We propose an HRL framework which sets auxiliary rewards for low-level skill training based on the advantage function of the high-level policy. This auxiliary reward enables efficient, simultaneous learning of the high-level policy and low-level skills without using task-specific knowledge. In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy. Experimental results show that our algorithm dramatically outperforms other state-of-the-art HRL methods in Mujoco domains. We also find both low-level and high-level policies trained by our algorithm transferable.


DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning

Neural Information Processing Systems

Hierarchical Reinforcement Learning (HRL) has made notable progress in complex control tasks by leveraging temporal abstraction. However, previous HRL algorithms often suffer from serious data inefficiency as environments get large. The extended components, $i.e.$, goal space and length of episodes, impose a burden on either one or both high-level and low-level policies since both levels share the total horizon of the episode. In this paper, we present a method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement Learning (DHRL) which can alleviate this problem by decoupling the horizons of high-level and low-level policies and bridging the gap between the length of both horizons using a graph. DHRL provides a freely stretchable high-level action interval, which facilitates longer temporal abstraction and faster training in complex tasks. Our method outperforms state-of-the-art HRL algorithms in typical HRL environments. Moreover, DHRL achieves long and complex locomotion and manipulation tasks.


Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem

Housseini, Ali Al, Rottondi, Cristina, Ayoub, Omran

arXiv.org Artificial Intelligence

Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.


Convergence and stability of Q-learning in Hierarchical Reinforcement Learning

Manenti, Massimiliano, Iannelli, Andrea

arXiv.org Artificial Intelligence

Decision-making architectures have played a central role for decades [1] both in engineering and other domains, e.g., guidance, navigation and control of Apollo missions [2], chemical plants [3], smart grids [4], unmanned aerial vehicles [5], recommender systems [6], and algorithms [7]. Moreover, architectures are ubiquitous in nature, e.g., diversity in the nervous system enables humans to have fast and accurate sensorimotor control [8]. Reinforcement Learning (RL) is a framework in which an agent learns to make sequential decisions through interaction with an environment in order to maximize cumulative reward [9]. Decision-making architectures have also been proposed and studied in RL. Hierarchical Reinforcement Learning (HRL) is a subfield of RL that deals with hierarchical structures for decision-making agents. Prospective advantages include improved long-term credit assignment, continual learning, interpretability, and the integration of preexisting policies [10], [11].




Incorporating Spatial Information into Goal-Conditioned Hierarchical Reinforcement Learning via Graph Representations

Zhang, Shuyuan, Wang, Zihan, Chang, Xiao-Wen, Precup, Doina

arXiv.org Artificial Intelligence

The integration of graphs with Goal-conditioned Hierarchical Reinforcement Learning (GCHRL) has recently gained attention, as intermediate goals (subgoals) can be effectively sampled from graphs that naturally represent the overall task structure in most RL tasks. However, existing approaches typically rely on domain-specific knowledge to construct these graphs, limiting their applicability to new tasks. Other graph-based approaches create graphs dynamically during exploration but struggle to fully utilize them, because they have problems passing the information in the graphs to newly visited states. Additionally, current GCHRL methods face challenges such as sample inefficiency and poor subgoal representation. This paper proposes a solution to these issues by developing a graph encoder-decoder to evaluate unseen states. Our proposed method, Graph-Guided sub-Goal representation Generation RL (G4RL), can be incorporated into any existing GCHRL method when operating in environments with primarily symmetric and reversible transitions to enhance performance across this class of problems. We show that the graph encoder-decoder can be effectively implemented using a network trained on the state graph generated during exploration. Empirical results indicate that leveraging high and low-level intrinsic rewards from the graph encoder-decoder significantly enhances the performance of state-of-the-art GCHRL approaches with an extra small computational cost in dense and sparse reward environments.